Machine translation evaluation inside QARLA

نویسندگان

  • Jesús Giménez
  • Enrique Amigó
  • Chiori Hori
چکیده

In this work we present the fundamentals of the IQMT framework for MT evaluation. IQMT offers a common workbench on which existing evaluation metrics can be utilized. We suggest the IQ measure and test it on the Chinese-toEnglish data from the IWSLT 2004 Evaluation Campaign. We show how the correlation with human assessments at the system level improves substantially for most individual metrics. Moreover, IQMT allows to robustly combine several metrics avoiding scaling problems and metric weightings. Several metric combinations were tried, but correlations did not further improve significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IQmt: A Framework for Automatic Machine Translation Evaluation

Abstract We present the IQMT Framework for Machine Translation Evaluation Inside QARLA. IQMT offers a common workbench in which evaluation metrics can be utilized and combined. It provides i) a measure to evaluate the quality of any set of similarity metrics (KING), ii) a measure to evaluate the quality of a translation using a set of similarity metrics (QUEEN), and iii) a measure to evaluate t...

متن کامل

MT Evaluation: Human-Like vs. Human Acceptable

We present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Hum...

متن کامل

Evaluating DUC 2004 Tasks with the QARLA Framework

This papers reports the application of the QARLA evaluation framework to the DUC 2004 testbed (tasks 2 and 5). Our experiment addresses two issues: how well QARLA evaluation measures correlate with human judgements, and what additional insights can be provided by the QARLA framework to the DUC evaluation exercises.

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Evaluación de resúmenes automáticos mediante QARLA

This article shows an application of the QARLA evaluation framework on DUC-2004 (tasks 2 and 5). The QARLA framework allows to evaluate summaries with regard to different features. Second, it allows to combine and meta-evaluate different similarity metrics, giving more weigh to metrics which characterize models (manual summaries) regarding automatic summaries.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005